General Information


This post is intended to act as a brief Situation Report (SITREP) about the concerning developments regarding a new strain of SARS-CoV-2 - the virus that is causing the COVID-19 global pandemic.

As of 01/16/2021, there's been 17873 sequences detected in over 38 countries under this strain (see table for definition). In the US, there's been 81 sequences detected in over 13 states.

This report follows a bottom-up approach starting with background information, and followed by prevalence at the state, national, and global level in that order. The table on the left contains the key mutations that represent the strain. The plot on the right compares the genetic distance, a measure of evolutionary progress, between B117 and non-B117 (related) samples.

Table 1.1: Key Mutations that define the strain
Gene Nucleotide Mutations Amino Acid Changes
ORF1ab C3266T, T6953C, C5387A, 11288:11296 deletion T1001I, I2230T, A1708D
S A23062T, C23270A, A23402G, C23603A, C23708T, T24505G, G24913C DEL69-70, DEL144Y, N501Y, A570D, D614G, P681H, T716I, S982A, D1118H
N GAT28279CTA, C28976T D3L, S235F
ORF8 G28047T, C27971T, A28110G R52I, Q27_, Y73C
Figure 1.1: shows the genetic distance (root-to-tip), a measure of evolutionary changes for strain and non-strain (related) samples (excluding other well known VOCs e.g. B.1.135).

State Prevalence


Figure 2.1 shows the spatial (geographical) prevalence of the strain across California.
Figure 2.2 shows the temporal (over time) prevalence of the strain across California.

National Prevalence


Figure 3.1 shows the spatial (geographical) prevalence of the strain across the US.
Figure 3.2 shows the temporal (over time) prevalence of the strain across the US.

Global Prevalence

More detailed information on global prevalence of SARS-CoV-2 strains of concern can be found in this post.

Figure 4.1 shows the spatial (geographical) prevalence of the strain across the world.
Figure 4.2 shows the temporal (over time) prevalence of the strain across the world.

Notes on Sampling


As figure 3.2 indicates, the B.1.1.7 genomes in the US (so far), were not a result of unbiased sequencing but were identified by S-gene target failures (SGTF) in community-based diagnostic PCR testing. Since it was not an unbiased approach, it does not indicate the true prevalence of the B117 lineage in the US. This only tells us that the lineage is present in the US.
P.S: estimates of true prevalence in the US are discussed in this post

The figure above is a simple illustration of how genomic surveillance of COVID-19 samples could allow us to elucidate an increasingly clear picture of how the virus is evolving and spreading. The pictures above are electromagnetic microscopy images of SARS-CoV-2 that are " crappified" (salt & pepper noise) to varying degrees depending on the rate of COVID-19 sequencing at each location. As a reference, we include a clear picture on the right to indicate that a 5% genomic sampling rate would be an ideal (first) objective to be able to observe statistically significant phenomena.

Closing Statements and Comments


We conclude with brief comments on how we may be able to improve the genomic surveillance signal of SARS-CoV-2.

  • Research laboratories across the US are encouraged to contribute to COVID-19 genomic sequencing efforts
    • metadata entry when submitting to public databases
    • indicate whether the sample was randomly sequenced or not (biased) under fields `purpose_of_sequencing` or `Additional host information` on GISAID, for example
      (i.e. the submitter should specify the generative process that led to each observation)
    • reporting, as accurately as possible, collection dates and locations of samples (e.g. counties in US)